Optimized Mining of a Concise Representation for Frequent Patterns based on Disjunctions Rather than Conjunctions
نویسندگان
چکیده
Exact condensed representations were introduced in order to offer a small-sized set of elements from which the faithful retrieval of all frequent patterns is possible. In this paper, a new exact concise representation only based on particular elements from the disjunctive search space will be introduced. In this space, a pattern is characterized by its disjunctive support, i.e., the frequency of complementary occurrences – instead of the ubiquitous co-occurrence link – of its items. In this respect, we mainly focus here on proposing an efficient tool for mining this representation. For this purpose, we introduce an algorithm, called DSSRM, dedicated to this task. We also propose several techniques to optimize its mining time and memory consumption. The empirical study carried out on benchmark datasets shows that DSSRM is faster by several orders of magnitude than MEP; the mining algorithm of the unique other representation using disjunctions of items. Introduction and motivations Given a set of items and a set of transactions, the well known frequent pattern mining problem consists of getting out, from a dataset, patterns having a number of occurrences (i.e., conjunctive support or simply support) greater than or equal to a user-defined threshold (Agrawal and Srikant 1994). Unfortunately, in practice, the number of frequent patterns is overwhelmingly large, hampering its effective exploitation by end-users. In this situation, a determined effort focused on defining a manageably-sized set of patterns from which we can regenerate all frequent patterns along with their exact supports. Such a set is commonly called exact condensed (or concise) representation for frequent patterns. On the other hand, in many real-life applications like market basket analysis, medical data analysis, social network analysis and bioinformatics, etc., the disjunctive connector linking items can bring key information as well as a summarizing method of conveyed knowledge. An interesting solution is then offered through the concise representation based on the joint use of some particular disjunctive patterns – more ∗This work was partially supported by the French-Tunisian project CMCU-Utique 05G1412. Copyright c © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. precisely essential patterns (Casali, Cicchetti, and Lakhal 2005; Kryszkiewicz 2009) and disjunctive closed patterns (Hamrouni, Ben Yahia, and Mephu Nguifo 2009). This representation constitutes a basis for straightforwardly deriving the conjunctive, disjunctive and negative frequencies of a pattern without information loss. It is also characterized by interesting compactness rates when compared to the representations of the literature (please see (Hamrouni, Ben Yahia, and Mephu Nguifo 2009) for details). From an algorithmic point of view, the MEP algorithm (Casali, Cicchetti, and Lakhal 2005) was proposed for extracting the frequent essential pattern-based representation. On the other hand, only little attention was paid to the mining performances of the disjunctive closed pattern-based representation, since the quantitative aspect was the main thriving focus in (Hamrouni, Ben Yahia, and Mephu Nguifo 2009). In this paper, for the sake of filling this gap, we mainly focus on a new algorithm, called DSSRM, dedicated to an optimized extraction of disjunctive closed patterns. To the best of our knowledge, this algorithm is the first one aiming at mining disjunctive patterns through a dedicated traversal of the disjunctive search space. The DSSRM algorithm hence relies on an efficient method based on an exploitation of the complementary of a pattern w.r.t. the set of items of the dataset. We also propose a thorough discussion about several other optimization techniques used to speed it up. Then, we carry out series of experiments on real-life datasets in order to: (i) Assess the impact of the optimization of the disjunctive support computation adopted by DSSRM. (ii) Compare both the DSSRM andMEP algorithms performances. The obtained results assert, on the one hand, the added-value of the optimization techniques we introduced and, on the other hand, that DSSRM outperforms MEP by several orders of magnitude. The remainder of the paper is organized as follows. Section 2 presents the key notions used in this paper. In Section 3, we describe the main ideas of the proposed representation. Section 4 details the DSSRM algorithm, dedicated to the extraction of this representation. Several optimization techniques designed in order to improve its computational time and memory consumption are also described. In SecDSSRM is the acronym of Disjunctive Search Space-based Representation Miner. 422 Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2010)
منابع مشابه
BLOSOM: A Framework for Mining Boolean Expressions
We introduce a novel framework, called BLOSOM, for mining (frequent) boolean expressions over binary-valued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. We focus on mining the simplest expressions (theminimal generators) for each class. We also propose a closure op...
متن کاملMining Frequent Boolean Expressions: Application to Gene Expression and Regulatory Modeling
Regulatory network analysis and other bioinformatics tasks require the ability to induce and represent arbitrary boolean expressions from data sources. In this paper, the authors introduce a novel framework called BLOSOM for mining (frequent) boolean expressions over binary-valued datasets. Boolean expressions can be grouped into four categories: pure conjunctions, pure disjunctions, conjunctio...
متن کاملA Relational Approach for Discovering Frequent Patterns with Disjunctions
Traditional pattern discovery approaches permit to identify frequent patterns expressed in form of conjunctions of items and represent their frequent co-occurrences. Although such approaches have been proved to be effective in descriptive knowledge discovery tasks, they can miss interesting combinations of items which do not necessarily occur together. To avoid this limitation, we propose a met...
متن کاملBLOSOM: A Framework for Mining Arbitrary Boolean Expressions over Attribute Sets
We introduce a novel framework (BLOSOM) for mining (frequent) boolean expressions over binary-valued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, we propose a closure operator that naturally leads to the concept of a closed boolean expression. Th...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010